Goto

Collaborating Authors

 optimal statistical rate


Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up

Neural Information Processing Systems

We analyse the learning performance of Distributed Gradient Descent in the context of multi-agent decentralised non-parametric regression with the square loss function when i.i.d.


Generalized Eigenvalue Problems with Generative Priors

Neural Information Processing Systems

Generalized eigenvalue problems (GEPs) find applications in various fields of science and engineering. For example, principal component analysis, Fisher's discriminant analysis, and canonical correlation analysis are specific instances of GEPs and are widely used in statistical data processing. In this work, we study GEPs under generative priors, assuming that the underlying leading generalized eigenvector lies within the range of a Lipschitz continuous generative model. Under appropriate conditions, we show that any optimal solution to the corresponding optimization problems attains the optimal statistical rate. Moreover, from a computational perspective, we propose an iterative algorithm called the Projected Rayleigh Flow Method (PRFM) to approximate the optimal solution. We theoretically demonstrate that under suitable assumptions, PRFM converges linearly to an estimated vector that achieves the optimal statistical rate. Numerical results are provided to demonstrate the effectiveness of the proposed method.



Generalized Eigenvalue Problems with Generative Priors

Neural Information Processing Systems

Generalized eigenvalue problems (GEPs) find applications in various fields of science and engineering. For example, principal component analysis, Fisher's discriminant analysis, and canonical correlation analysis are specific instances of GEPs and are widely used in statistical data processing. In this work, we study GEPs under generative priors, assuming that the underlying leading generalized eigenvector lies within the range of a Lipschitz continuous generative model. Under appropriate conditions, we show that any optimal solution to the corresponding optimization problems attains the optimal statistical rate. Moreover, from a computational perspective, we propose an iterative algorithm called the Projected Rayleigh Flow Method (PRFM) to approximate the optimal solution.


Reviews: Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up

Neural Information Processing Systems

Update: I have read the author response and appreciate that they addressed some of my comments. The focus is on obtaining statistical guarantees about the generalization. This is a highly relevant direction to the growing body of work on decentralized training. The paper is generally well written, contains very original ideas, and I was very excited to read it. The main reason I didn't give a higher rating was because of the limitations listed at the beginning of Sec 5. I commend the authors for acknowledging them.


Reviews: Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up

Neural Information Processing Systems

This paper provides a nice and clean characterization of a decentralized learning problem. The result is perhaps unsurprising in its form, but the analysis is far from trivial. There are some nontrivial assumptions for their results to hold which perhaps limit the scope of this result but do suggest interesting avenues for future research in this increasingly important area. Overall, this is a solid contribution and should be of interest to NeurIPS attendees who work in optimization and distributed systems.


Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up

Neural Information Processing Systems

We analyse the learning performance of Distributed Gradient Descent in the context of multi-agent decentralised non-parametric regression with the square loss function when i.i.d. We show that if agents hold sufficiently many samples with respect to the network size, then Distributed Gradient Descent achieves optimal statistical rates with a number of iterations that scales, up to a threshold, with the inverse of the spectral gap of the gossip matrix divided by the number of samples owned by each agent raised to a problem-dependent power. The presence of the threshold comes from statistics. It encodes the existence of a "big data" regime where the number of required iterations does not depend on the network topology. In this regime, Distributed Gradient Descent achieves optimal statistical rates with the same order of iterations as gradient descent run with all the samples in the network.


Collaborative Learning with Shared Linear Representations: Statistical Rates and Optimal Algorithms

Niu, Xiaochun, Su, Lili, Xu, Jiaming, Yang, Pengkun

arXiv.org Machine Learning

Collaborative learning enables multiple clients to learn shared feature representations across local data distributions, with the goal of improving model performance and reducing overall sample complexity. While empirical evidence shows the success of collaborative learning, a theoretical understanding of the optimal statistical rate remains lacking, even in linear settings. In this paper, we identify the optimal statistical rate when clients share a common low-dimensional linear representation. Specifically, we design a spectral estimator with local averaging that approximates the optimal solution to the least squares problem. We establish a minimax lower bound to demonstrate that our estimator achieves the optimal error rate. Notably, the optimal rate reveals two distinct phases. In typical cases, our rate matches the standard rate based on the parameter counting of the linear representation. However, a statistical penalty arises in collaborative learning when there are too many clients or when local datasets are relatively small. Furthermore, our results, unlike existing ones, show that, at a system level, collaboration always reduces overall sample complexity compared to independent client learning. In addition, at an individual level, we provide a more precise characterization of when collaboration benefits a client in transfer learning and private fine-tuning.


How Projected Gradient Descent works in Machine Learning pipelines part1

#artificialintelligence

Abstract: This paper addresses a distributed convex optimization problem with a class of coupled constraints, which arise in a multi-agent system composed of multiple communities modeled by cliques. First, we propose a fully distributed gradient-based algorithm with a novel operator inspired by the convex projection, called the clique-based projection. Next, we scrutinize the convergence properties for both diminishing and fixed step sizes. For diminishing ones, we show the convergence to an optimal solution under the assumptions of the smoothness of an objective function and the compactness of the constraint set. Additionally, when the objective function is strongly monotone, the strict convergence to the unique solution is proved without the assumption of compactness.


Working with Projected Gradient Descent part1(Machine Learning)

#artificialintelligence

Abstract: This paper addresses a distributed convex optimization problem with a class of coupled constraints, which arise in a multi-agent system composed of multiple communities modeled by cliques. First, we propose a fully distributed gradient-based algorithm with a novel operator inspired by the convex projection, called the clique-based projection. Next, we scrutinize the convergence properties for both diminishing and fixed step sizes. For diminishing ones, we show the convergence to an optimal solution under the assumptions of the smoothness of an objective function and the compactness of the constraint set. Additionally, when the objective function is strongly monotone, the strict convergence to the unique solution is proved without the assumption of compactness.